Skip to content

Add an azdo failure skill#123913

Open
lewing wants to merge 52 commits intodotnet:mainfrom
lewing:helix-failures
Open

Add an azdo failure skill#123913
lewing wants to merge 52 commits intodotnet:mainfrom
lewing:helix-failures

Conversation

@lewing
Copy link
Member

@lewing lewing commented Feb 2, 2026

Summary

Adds an AI agent skill for analyzing Azure DevOps and Helix test failures across dotnet repositories. When asked to investigate CI failures, the skill teaches Copilot how to query APIs, extract failure details, and provide actionable recommendations.

Features

Core Functionality

  • PR and Build Analysis: Query by PR number or Azure DevOps build ID
  • Multi-repository Support: Works with dotnet/runtime, sdk, aspnetcore, roslyn, and others
  • Helix Integration: Extracts work item details, console logs, and artifacts
  • Local Test Support: Handles non-Helix tests (e.g., dotnet/sdk style)

Intelligent Analysis

  • Build Analysis Integration: Automatically fetches known issues from the Build Analysis PR check
  • Known Issue Search: Searches GitHub for issues with "Known Build Error" label
  • MihuBot Semantic Search: Optional integration with MihuBot's semantic database (-SearchMihuBot)
  • PR Change Correlation: Correlates failures with files changed in the PR
  • Canceled vs Failed Jobs: Distinguishes between actual failures and dependency cancellations

Smart Recommendations

Provides actionable guidance at the end of analysis:

Recommendation Meaning
NO RETRY NEEDED All failures match known tracked issues
LIKELY PR-RELATED Failures correlate with PR changes
POSSIBLY TRANSIENT No clear cause - check main branch

Usage

# Analyze PR failures
./scripts/Get-HelixFailures.ps1 -PRNumber 123445 -ShowLogs

# Other repositories
./scripts/Get-HelixFailures.ps1 -PRNumber 12345 -Repository "dotnet/sdk"

# With semantic search
./scripts/Get-HelixFailures.ps1 -PRNumber 123445 -SearchMihuBot

Example Prompts

Structure

.github/skills/azdo-helix-failures/
├── SKILL.md                 # Skill documentation
├── scripts/
│   └── Get-HelixFailures.ps1   # Main PowerShell script
└── references/
    ├── azdo-helix-reference.md    # API and build definition details
    └── manual-investigation.md    # Manual investigation steps

Related

Supersedes #123863 - this PR includes a PowerShell script for automation rather than just documentation, plus additional features like Build Analysis integration, known issue search, and smart retry recommendations.

Copilot AI review requested due to automatic review settings February 2, 2026 21:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new skill for retrieving and analyzing test failures from Azure DevOps builds and Helix test runs in the dotnet/runtime CI pipeline.

Changes:

  • Introduces documentation and tooling to help investigate CI test failures
  • Provides PowerShell script to query Azure DevOps and Helix APIs for failure information
  • Enables querying by build ID or PR number with optional detailed log fetching

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
.github/skills/azdo-helix-failures/SKILL.md Documents the skill's purpose, usage examples, manual investigation steps, and common failure patterns
.github/skills/azdo-helix-failures/Get-HelixFailures.ps1 PowerShell script that queries Azure DevOps for failed jobs and retrieves Helix console logs

- Fix unnecessary backtick escaping in string interpolation
- Rename $matches to $urlMatches/$failureMatches to avoid shadowing automatic variable
- Add gh CLI dependency check with helpful error message
- Add -TimeoutSec parameter (default 30s) for API calls
- Add -MaxFailureLines parameter (default 50) for configurable output
- Improve Format-TestFailure to detect end of stack trace via empty lines
- Add Write-Verbose output for debugging
- Update SKILL.md with new parameters, prerequisites, and org/project documentation
Copilot AI review requested due to automatic review settings February 2, 2026 21:54
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

- Add Extract-BuildErrors function to parse build logs for error patterns
- Add Get-FailureClassification function with known patterns:
  - macOS clang module cache/dsymutil issues
  - NativeAOT size regressions
  - NuGet package errors
  - Device infrastructure issues
  - Helix timeouts
  - C# and MSBuild compilation errors
- Expand Format-TestFailure patterns for better Helix log extraction
- For non-Helix failures, now extracts actual errors and provides
  classification, suggested action, and transient failure detection
- Add -Repository parameter to support repos other than dotnet/runtime
- Add -ContextLines parameter for error context
- Reorder error patterns (specific before general) to avoid overmatch
- Fix Select-Object ordering (First then Unique)
- Add classification to Helix test failures, not just build failures
- Expand Format-TestFailure to capture multiple failures (up to 3)
- Add new failure patterns:
  - OutOfMemoryException (transient)
  - StackOverflowException
  - Assertion failures
  - Test timeouts
  - Network connectivity issues
Copilot AI review requested due to automatic review settings February 2, 2026 22:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

- Use ${LogId} syntax to prevent PowerShell parsing $LogId? as ternary
- Normalize line breaks in log content before extracting Helix URLs
- Update URL pattern to handle workitem names with special chars
Copilot AI review requested due to automatic review settings February 2, 2026 23:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

@steveisok
Copy link
Member

It's looking like I should close #123863 in favor of yours :-)

Script improvements:
- Add -HelixJob parameter for direct Helix job queries
- Add -WorkItem parameter to query specific work items
- Add Get-HelixJobDetails, Get-HelixWorkItems, Get-HelixWorkItemDetails functions
- Show work item artifacts, machine name, duration, exit code
- List failed work items when querying a job without -WorkItem

Documentation improvements (from PR dotnet#123863):
- Add build definition IDs table (129, 133, 139)
- Add failure classification table with all patterns
- Add Helix API curl examples
- Add artifact download documentation
- Add environment variable extraction examples
- Add links to triaging guide, area owners, Helix swagger
- Document -HelixJob and -WorkItem parameters
- Add Extract-TestRunUrls function to parse 'Published Test Run' URLs
- Add Get-LocalTestFailures to detect non-Helix test failures
- Add classification for local xUnit test failures
- Update main flow to report local test failures with links
- Update SKILL.md with new documentation
Copilot AI review requested due to automatic review settings February 2, 2026 23:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

- Add Get-AzDOTestResults function using az devops invoke
- Fetch actual failed test names when az CLI is available
- Show up to 10 failed test names with count
- Add Extract-HelixLogUrls function to parse Helix console log URLs
- Display work item names with direct log links for Helix failures
- Deduplicate URLs to avoid showing duplicates
Copilot AI review requested due to automatic review settings February 3, 2026 02:04
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

All examples now use ./scripts/Get-HelixFailures.ps1 relative to skill directory.
- Add try/catch to cache cleanup with verbose logging on failure
- Add comment explaining allowed chars in search term regex
- Use TryParse for build ID parsing instead of direct cast
- Validate headSha format (40 hex chars) before using in API call
Copilot AI review requested due to automatic review settings February 4, 2026 05:30
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

- Fix FileName property in manual-investigation.md (was Name)
- Fix dotnet/sdk description: uses both local and Helix tests
@pavelsavara
Copy link
Member

I prefer to be able to iterate on the log files after they are downloaded

  • I suggest that CacheTTLSeconds is 1 hour
  • I want to add simple index to the cache which would help me to navigate what files are there
  • I find the default output of the tool too overwhelming (my context for sure and maybe copilot focus too), I want to introduce -Silent flag that would redirect detailed analysis that the script does into log file, that could be re-read later during analysis.

I'm pushing my improvements into https://github.com/pavelsavara/runtime/tree/helix-failures_pavel

@steveisok
Copy link
Member

steveisok commented Feb 4, 2026

@jkotas @stephentoub @lewing @pavelsavara I think what we have in this skill is super useful and if everyone agrees, we should get this in. Iterating in the PR is good, but I suspect we'll be able to offer improvements until the cows come home.

Follow-ups are cheap.

@lewing
Copy link
Member Author

lewing commented Feb 4, 2026

I prefer to be able to iterate on the log files after they are downloaded

  • I suggest that CacheTTLSeconds is 1 hour
  • I want to add simple index to the cache which would help me to navigate what files are there
  • I find the default output of the tool too overwhelming (my context for sure and maybe copilot focus too), I want to introduce -Silent flag that would redirect detailed analysis that the script does into log file, that could be re-read later during analysis.

I'm pushing my improvements into https://github.com/pavelsavara/runtime/tree/helix-failures_pavel

Think of the caching as a way to buffer the network calls, not preserve results. Longer durations caused a lot of issues with stale data due to the naive url based cache. If you want to preserve the info, tell the llm that in the prompt. Similarly, you can ask for the output format you like in the prompt and/or combine this with another skill that structures things how you prefer.

@pavelsavara
Copy link
Member

pavelsavara commented Feb 4, 2026

If you want to preserve the info, tell the llm that in the prompt.

Your script is not exposing full log files and so, the LLM or any other script trying to integrate with this would have to guess schema of your cache and lift it from there (and get busted 30 seconds later by cache purge).

Let's ignore my branch. On further reflection on it I think it would be good to separate the download/graph/crawling ability of your script from the post-processing ability.

I have my own such scripts, but I thought we could develop this as a team, without forcing (me) into particular style of log processing.

But as Steve said, this is very useful as it is and I don't want to block merging it.

@jkotas
Copy link
Member

jkotas commented Feb 4, 2026

Can you share a log of what a successful use of this skill looks like?

I cloned this branch, started copilot, and typed "Use azdo-helix-failures skill to analyze #123478". This is what I got: log.txt . I stopped it since it was clearly not converging.

@pavelsavara
Copy link
Member

This is what I got: log.txt .

It had problems running the script, gave up and tried to do it without that script "by bare hands".

I'm getting shell issues. Let me use a different approach via GitHub APIs to analyze the PR status directly.

@pavelsavara
Copy link
Member

Nice session (with my branch) https://gist.github.com/pavelsavara/af35aedd4d0c7c23c3e1a9971af73a8d

@jkotas
Copy link
Member

jkotas commented Feb 4, 2026

It had problems running the script

How do I fix that? It fails with file not found but the file C:\runtime\.github\skills\azdo-helix-failures\scripts\Get-HelixFailures.ps1 does exist on my machine.

@pavelsavara
Copy link
Member

copilot help config
copilot help environment
copilot help permissions

code c:\Users\pavelsavara\.copilot\config.json

@lewing
Copy link
Member Author

lewing commented Feb 4, 2026

Investigation: Script Path References

I reviewed the skill's script path references against the Anthropic Agent Skills documentation.

Finding: The current paths using ./scripts/Get-HelixFailures.ps1 are correct per the Agent Skills standard. The documentation states:

You can reference helper scripts within your skill folder via path: ./scripts/name_of_script.py. Scripts are executed in controlled agent environments when triggered by prompts or agent workflow steps.

The paths in SKILL.md and reference docs are all consistent and follow this pattern.

Regarding @jkotas' issue: Looking at the log, every PowerShell command failed with "File not found" - including basic commands like Test-Path and Write-Host "Test". This suggests the issue is with the PowerShell/shell configuration on that machine rather than the skill's path references. The agent correctly constructed absolute paths like C:\runtime\.github\skills\azdo-helix-failures\scripts\Get-HelixFailures.ps1, but the shell itself couldn't execute.

Note: When using skills, agents may either:

  1. Change to the skill directory before running scripts
  2. Resolve relative paths from the skill's location

How the agent resolves these paths depends on the Copilot CLI implementation. The skill documentation correctly documents paths relative to the skill directory as specified by the standard.

@lewing
Copy link
Member Author

lewing commented Feb 4, 2026

@jkotas I know it sounds odd but try explaining to copilot what it is doing wrong

checked other documentation as well

  1. GitHub Docs: https://docs.github.com/en/copilot/concepts/agents/about-agent-skills
  2. VS Code Docs: https://code.visualstudio.com/docs/copilot/customization/agent-skills
  3. Agent Skills Specification: https://agentskills.io/specification

- Enhance description to include 'when to use' keywords per spec requirement
- Remove misleading note about relative paths (agents resolve from skill root)
Copilot AI review requested due to automatic review settings February 4, 2026 19:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

# Check if failures correlate with PR changes
$hasCorrelation = $false
foreach ($failure in $allFailuresForCorrelation) {
$failureText = ($failure.Errors + $failure.HelixLogs + $failure.FailedTests) -join " "
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recommendation logic at the end of the script uses string concatenation to join failure properties that may contain arrays. At line 1973, when Errors, HelixLogs, or FailedTests are arrays, the join operation will work correctly. However, there's a potential issue: if any of these properties are null or not initialized properly earlier in the code, this could cause unexpected behavior.

Consider adding null checks or ensuring these properties are always initialized as empty arrays when creating the failure objects throughout the script (e.g., at lines 1698, 1800, 1866).

Suggested change
$failureText = ($failure.Errors + $failure.HelixLogs + $failure.FailedTests) -join " "
$allFailureParts = @($failure.Errors) + @($failure.HelixLogs) + @($failure.FailedTests)
$failureText = ($allFailureParts | Where-Object { $_ }) -join " "

Copilot uses AI. Check for mistakes.
@jkotas
Copy link
Member

jkotas commented Feb 4, 2026

I know it sounds odd but try explaining to copilot what it is doing wrong

The problem was with permissions. I had to give copilot extra permissions to make it work - thanks @pavelsavara for the tip. I am not sure whether I am comfortable doing that given what I have seen doing it when it went off track - creating python scripts in random places, etc. - but that's separate problem. I think I am going to stick to running it in github copilot sandbox for now at least.

With the extra permissions, it produced:

● PR #123478 Failure Analysis Summary

  PR Title: Add OpenStandardInputHandle, OpenStandardOutputHandle, and OpenStandardErrorHandle APIs

  Builds Analyzed

   - Build 1278676 (runtime): 1 failed job, 6 local test failures
   - Build 1278825 (runtime-extra-platforms): 25 failed jobs, 8 local test failures

  🔴 LIKELY PR-RELATED FAILURES

  The failures correlate with files changed by this PR:

   - src/libraries/System.Console/ref/System.Console.cs
   - src/libraries/System.Console/src/System.Console.csproj
   - src/libraries/System.Console/src/System/Console.cs

  Critical Error Pattern: Mono AOT compilation failures on tvOS/iOS with "Invalid IL code" errors:

   Unable to compile method 'System.ValueType:Equals (object)' due to: 'Invalid IL code ... IL_0000: ret'
   Unable to compile method 'System.Activator:CreateInstance (System.Type)' due to: 'Invalid IL code ... IL_0000: ret'

  This pattern indicates empty method bodies in System.Private.CoreLib that Mono AOT can't compile — likely the PR introduced method stubs that return nothing when they should have implementations.

  Known Infrastructure Issues (7 tracked)
...

This is invalid conclusion since this PR have not modified System.Private.Corelib and thus it is very unlikely that it can introduce invalid IL in CoreLib.

I have no problem with merging this if others find it helpful.

@lewing
Copy link
Member Author

lewing commented Feb 4, 2026

This is invalid conclusion since this PR have not modified System.Private.Corelib and thus it is very unlikely that it can introduce invalid IL in CoreLib.

I have no problem with merging this if others find it helpful.

try asking it to support its conclusions.

@steveisok
Copy link
Member

I have no problem with merging this if others find it helpful.

I find it very useful even though in some instances conclusions it draws can be varying degrees of off. When that happens, conversing with it allows course correction and will lead to a better place.

lewing and others added 2 commits February 4, 2026 15:05
The script's heuristic-based recommendations may be incomplete.
Instruct agents to review detailed findings and form their own
analysis before presenting results to users.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 4, 2026 23:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comment on lines +1846 to +1847
else {
# No Helix tasks - this is a build failure, extract actual errors
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing closing brace: The if ($task.log) block that starts at line 1785 is never closed. There should be a closing brace after line 1842 (which closes the if ($logContent) block) and before line 1843 (which closes the foreach loop). This missing brace causes the else block at line 1846 to be incorrectly positioned outside the try block, resulting in a PowerShell syntax error.

Suggested change
else {
# No Helix tasks - this is a build failure, extract actual errors
else {
# No Helix tasks - this is a build failure, extract actual errors

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

5 participants